BabelNet Logo
   HOME

TheInfoList



OR:

BabelNet is a
multilingual Multilingualism is the use of more than one language, either by an individual speaker or by a group of speakers. It is believed that multilingual speakers outnumber monolingual speakers in the world's population. More than half of all E ...
lexicalized
semantic network A semantic network, or frame network is a knowledge base that represents semantic relations between concepts in a network. This is often used as a form of knowledge representation. It is a directed or undirected graph consisting of vertices, ...
and
ontology In metaphysics, ontology is the philosophical study of being, as well as related concepts such as existence, becoming, and reality. Ontology addresses questions like how entities are grouped into categories and which of these entities exis ...
developed at the NLP group of the
Sapienza University of Rome The Sapienza University of Rome ( it, Sapienza – Università di Roma), also called simply Sapienza or the University of Rome, and formally the Università degli Studi di Roma "La Sapienza", is a Public university, public research university l ...
.R. Navigli and S. P Ponzetto. 2012
BabelNet: The Automatic Construction, Evaluation and Application of a Wide-Coverage Multilingual Semantic Network
Artificial Intelligence, 193, Elsevier, pp. 217-250.
BabelNet was automatically created by linking Wikipedia to the most popular computational
lexicon A lexicon is the vocabulary of a language or branch of knowledge (such as nautical or medical). In linguistics, a lexicon is a language's inventory of lexemes. The word ''lexicon'' derives from Koine Greek language, Greek word (), neuter of () ...
of the
English language English is a West Germanic language of the Indo-European language family, with its earliest forms spoken by the inhabitants of early medieval England. It is named after the Angles, one of the ancient Germanic peoples that migrated to the is ...
,
WordNet WordNet is a lexical database of semantic relations between words in more than 200 languages. WordNet links words into semantic relations including synonyms, hyponyms, and meronyms. The synonyms are grouped into '' synsets'' with short definition ...
. The integration is done using an automatic mapping and by filling in lexical gaps in resource-poor
language Language is a structured system of communication. The structure of a language is its grammar and the free components are its vocabulary. Languages are the primary means by which humans communicate, and may be conveyed through a variety of met ...
s by using
statistical machine translation Statistical machine translation (SMT) is a machine translation paradigm where translations are generated on the basis of statistical models whose parameters are derived from the analysis of bilingual text corpora. The statistical approach contrast ...
. The result is an
encyclopedic dictionary An encyclopedic dictionary typically includes many short listings, arranged alphabetically, and discussing a wide range of topics. Encyclopedic dictionaries can be general, containing articles on topics in many different fields; or they can sp ...
that provides
concept Concepts are defined as abstract ideas. They are understood to be the fundamental building blocks of the concept behind principles, thoughts and beliefs. They play an important role in all aspects of cognition. As such, concepts are studied by s ...
s and named entities
lexicalized In linguistics, lexicalization is the process of adding words, set phrases, or word patterns to a language's lexicon. Whether ''word formation'' and ''lexicalization'' refer to the same process is controversial within the field of linguistics. Mo ...
in many languages and connected with large amounts of
semantic relations Semantics (from grc, σημαντικός ''sēmantikós'', "significant") is the study of reference, meaning, or truth. The term can be used to refer to subfields of several distinct disciplines, including philosophy, linguistics and comput ...
. Additional lexicalizations and definitions are added by linking to free-license wordnets, OmegaWiki, the English
Wiktionary Wiktionary ( , , rhyming with "dictionary") is a multilingual, web-based project to create a free content dictionary of terms (including words, phrases, proverbs, linguistic reconstructions, etc.) in all natural languages and in a number ...
,
Wikidata Wikidata is a collaboratively edited multilingual knowledge graph hosted by the Wikimedia Foundation. It is a common source of open data that Wikimedia projects such as Wikipedia, and anyone else, can use under the CC0 public domain license. ...
,
FrameNet FrameNet is a research and resource development project based at the International Computer Science Institute (ICSI) in Berkeley, California, which has produced an electronic resource based on a theory of meaning called frame semantics. The data ...
,
VerbNet The VerbNet project maps PropBank PropBank is a corpus that is annotated with verbal propositions and their arguments—a "proposition bank". Although "PropBank" refers to a specific corpus produced by Martha Palmer ''et al.'', the term ''pro ...
and others. Similarly to WordNet, BabelNet groups
word A word is a basic element of language that carries an semantics, objective or pragmatics, practical semantics, meaning, can be used on its own, and is uninterruptible. Despite the fact that language speakers often have an intuitive grasp of w ...
s in different languages into sets of
synonyms A synonym is a word, morpheme, or phrase that means exactly or nearly the same as another word, morpheme, or phrase in a given language. For example, in the English language, the words ''begin'', ''start'', ''commence'', and ''initiate'' are all ...
, called ''Babel synsets''. For each Babel synset, BabelNet provides short definitions (called
glosses A gloss is a brief notation, especially a marginal one or an interlinear one, of the meaning of a word or wording in a text. It may be in the language of the text or in the reader's language if that is different. A collection of glosses is a ''g ...
) in many languages harvested from both WordNet and Wikipedia.


Statistics of BabelNet

, BabelNet (version 5.0) covers 500
language Language is a structured system of communication. The structure of a language is its grammar and the free components are its vocabulary. Languages are the primary means by which humans communicate, and may be conveyed through a variety of met ...
s. It contains almost 20 million synsets and around 1.4 billion
word sense In linguistics, a word sense is one of the meanings of a word. For example, a dictionary may have over 50 different senses of the word "play", each of these having a different meaning based on the context of the word's usage in a sentence, as fo ...
s (regardless of their language). Each Babel synset contains 2 synonyms per language, i.e., word senses, on average. The semantic network includes all the lexico-semantic relations from WordNet ( hypernymy and hyponymy,
meronymy In linguistics, meronymy () is a semantic relation between a meronym denoting a part and a holonym denoting a whole. In simpler terms, a meronym is in a ''part-of'' relationship with its holonym. For example, ''finger'' is a meronym of ''hand'' ...
and
holonymy In linguistics, meronymy () is a semantic relation between a meronym denoting a part and a holonym denoting a whole. In simpler terms, a meronym is in a ''part-of'' relationship with its holonym. For example, ''finger'' is a meronym of ''hand'' ...
,
antonymy In lexical semantics, opposites are words lying in an inherently incompatible binary relationship. For example, something that is ''long'' entails that it is not ''short''. It is referred to as a 'binary' relationship because there are two members ...
and
synonymy A synonym is a word, morpheme, or phrase that means exactly or nearly the same as another word, morpheme, or phrase in a given language. For example, in the English language, the words ''begin'', ''start'', ''commence'', and ''initiate'' are all ...
, etc., totaling around 364,000 relation edges) as well as an underspecified relatedness relation from Wikipedia (totaling around 1.3 billion edges). Version 5.0 also associates around 51 million images with Babel synsets and provides a Lemon RDF encoding of the resource, available via a SPARQL endpoint. 2.67 million synsets are assigned domain labels.


Applications

BabelNet has been shown to enable multilingual
Natural Language Processing Natural language processing (NLP) is an interdisciplinary subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language, in particular how to program computers to pro ...
applications. The lexicalized
knowledge Knowledge can be defined as awareness of facts or as practical skills, and may also refer to familiarity with objects or situations. Knowledge of facts, also called propositional knowledge, is often defined as true belief that is distinc ...
available in BabelNet has been shown to obtain state-of-the-art results in: *
semantic relatedness Semantic similarity is a metric defined over a set of documents or terms, where the idea of distance between items is based on the likeness of their meaning or semantic content as opposed to lexicographical similarity. These are mathematical tools ...
* multilingual
Word Sense Disambiguation Word-sense disambiguation (WSD) is the process of identifying which sense of a word is meant in a sentence or other segment of context. In human language processing and cognition, it is usually subconscious/automatic but can often come to consci ...
* multilingual Word Sense Disambiguation and
Entity Linking In natural language processing, entity linking, also referred to as named-entity linking (NEL), named-entity disambiguation (NED), named-entity recognition and disambiguation (NERD) or named-entity normalization (NEN) is the task of assigning a uni ...
with the
Babelfy Babelfy is a software algorithm for the disambiguation of text written in any language. Specifically, Babelfy performs the tasks of multilingual Word Sense Disambiguation (i.e., the disambiguation of common nouns, verbs, adjectives and adverbs) ...
system * video games with a purpose


Prizes and acknowledgments

BabelNet received th
META prize
2015 for "groundbreaking work in overcoming language barriers through a multilingual lexicalised semantic network and ontology making use of heterogeneous data sources". BabelNet featured prominently in a ''
Time Time is the continued sequence of existence and events that occurs in an apparently irreversible succession from the past, through the present, into the future. It is a component quantity of various measurements used to sequence events, to ...
'' magazine articleSteinmetz, Katy (May 12, 2016)
"Redefining the Modern Dictionary"
''
Time Time is the continued sequence of existence and events that occurs in an apparently irreversible succession from the past, through the present, into the future. It is a component quantity of various measurements used to sequence events, to ...
''. 187: 20-21.
about the new age of innovative and up-to-date lexical knowledge resources available on the Web.


See also

*
Babelfy Babelfy is a software algorithm for the disambiguation of text written in any language. Specifically, Babelfy performs the tasks of multilingual Word Sense Disambiguation (i.e., the disambiguation of common nouns, verbs, adjectives and adverbs) ...
*
EuroWordNet EuroWordNet is a system of semantic networks for European languages, based on WordNet. Each language develops its own wordnet but they are interconnected with ''interlingual links'' stored in the ''Interlingual Index'' (ILI). Unlike the original ...
* Knowledge acquisition *
Linguistic Linked Open Data In natural language processing, linguistics, and neighboring fields, Linguistic Linked Open Data (LLOD) describes a method and an interdisciplinary community concerned with creating, sharing, and (re-)using language resources in accordance with L ...
*
Semantic network A semantic network, or frame network is a knowledge base that represents semantic relations between concepts in a network. This is often used as a form of knowledge representation. It is a directed or undirected graph consisting of vertices, ...
*
Semantic relatedness Semantic similarity is a metric defined over a set of documents or terms, where the idea of distance between items is based on the likeness of their meaning or semantic content as opposed to lexicographical similarity. These are mathematical tools ...
*
Wikidata Wikidata is a collaboratively edited multilingual knowledge graph hosted by the Wikimedia Foundation. It is a common source of open data that Wikimedia projects such as Wikipedia, and anyone else, can use under the CC0 public domain license. ...
*
Wiktionary Wiktionary ( , , rhyming with "dictionary") is a multilingual, web-based project to create a free content dictionary of terms (including words, phrases, proverbs, linguistic reconstructions, etc.) in all natural languages and in a number ...
*
Word sense disambiguation Word-sense disambiguation (WSD) is the process of identifying which sense of a word is meant in a sentence or other segment of context. In human language processing and cognition, it is usually subconscious/automatic but can often come to consci ...
*
Word sense induction In computational linguistics, word-sense induction (WSI) or discrimination is an open problem of natural language processing, which concerns the automatic identification of the word sense, senses of a word (i.e. meaning (linguistics), meanings). Giv ...
*
UBY UBY is a large-scale lexical-semantic resource for natural language processing (NLP) developed at the Ubiquitous Knowledge Processing Lab (UKP) in the department of Computer Science of the Technische Universität Darmstadt . UBY is based on th ...


References


External links

* {{Natural language processing Lexical databases Knowledge bases Ontology (information science) Knowledge representation Computational linguistics Online dictionaries Multilingualism